Combinando semi-supervisão e hubness para aprimorar o agrupamento de dados em alta dimensão

نویسندگان

  • Mateus C. de Lima
  • Maria Camila Nardini Barioni
  • Humberto Luiz Razente
چکیده

The curse of dimensionality turns the high-dimensional data analysis a challenging task for data clustering techniques. In order to deal with highdimensional data, this paper presents a clustering approach that explores the combination of two strategies: semi-supervision and density estimation based on hubness scores. Initial experimental results show a good performance when applied on real data sets with different characteristics. Resumo. A chamada maldição da dimensionalidade faz com que a análise de dados em alta dimensão seja uma tarefa desafiadora para técnicas de agrupamento de dados. Este artigo apresenta uma abordagem de agrupamento que explora a combinação de estratégias de semi-supervisão e de estimativa de densidade baseada em pontuações hubness com foco em dados de alta dimensão. Os resultados experimentais iniciais mostram o seu bom desempenho quando aplicada em conjuntos de dados reais com diferentes caracterı́sticas.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nurses in post-operative heart surgery: professional competencies and organization strategies.

OBJECTIVE To analyze nurses' competencies with regard to their work in post-operative heart surgery and the strategies implemented to mobilize these competencies. METHOD This was an exploratory study with a qualitative approach and a methodological design of collective case study. It was carried out in three post-operative heart surgery units, consisting of 18 nurses. Direct observation and s...

متن کامل

Impacto da amostragem aleatória uniforme para o aumento da escalabilidade na geração de agrupamentos hierárquicos de séries espaço-temporais

This paper presents the results of a scalable approach to build hierarchical clustering from space-time series. The goal is to reduce the complexity in terms of space and time. The approach explores data sampling pre-processing techniques to reduce the numerosity of the data. The experiment indicates it is needed the development of more efficient strategies than the naive selection of samples (...

متن کامل

FPCluster: Uma estratégia eficiente de agrupamento out-of-core sem medida de similaridade

Clustering is one of the most popular and relevant data mining tasks. Two challenges for determining clusters are the volume of data to be grouped and the difficulty in defining a similarity measure applicable to the entire data set. In this work we present FPCluster, a new clustering algorithm that addresses both problems. The algorithm developed is based on the out-of-core building of frequen...

متن کامل

Halite-ds: Agrupamento de Dados em Subespaços de Séries Temporais Multidimensionais

Given a data stream with many attributes, how to cluster similar events? For example, how to cluster measurements of tens of climatic attributes to aid in forecasting the climate and extreme events? The task of clustering data with many attributes is known as subspace clustering. Today, there exists a need for algorithms of this type well-suited to process data streams. This paper proposes the ...

متن کامل

Estabilidade de Classificadores de Decisão em Árvore Binária para Dados Imagem em Alta Dimensão

This paper deals with the problem of classifying high-dimensional image data image data using a multiple stage classifier structured as a binary tree. The aim here consists in finding the optimal structure for the binary tree in the sense of achieving a stable accuracy. The advantage presented by a multiple stage classifier lies on the fact that only a sub-set of classes is considered at each s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016